71 research outputs found
Learning neural trans-dimensional random field language models with noise-contrastive estimation
Trans-dimensional random field language models (TRF LMs) where sentences are
modeled as a collection of random fields, have shown close performance with
LSTM LMs in speech recognition and are computationally more efficient in
inference. However, the training efficiency of neural TRF LMs is not
satisfactory, which limits the scalability of TRF LMs on large training corpus.
In this paper, several techniques on both model formulation and parameter
estimation are proposed to improve the training efficiency and the performance
of neural TRF LMs. First, TRFs are reformulated in the form of exponential
tilting of a reference distribution. Second, noise-contrastive estimation (NCE)
is introduced to jointly estimate the model parameters and normalization
constants. Third, we extend the neural TRF LMs by marrying the deep
convolutional neural network (CNN) and the bidirectional LSTM into the
potential function to extract the deep hierarchical features and
bidirectionally sequential features. Utilizing all the above techniques enables
the successful and efficient training of neural TRF LMs on a 40x larger
training set with only 1/3 training time and further reduces the WER with
relative reduction of 4.7% on top of a strong LSTM LM baseline.Comment: 5 pages and 2 figure
Joint Bayesian Gaussian discriminant analysis for speaker verification
State-of-the-art i-vector based speaker verification relies on variants of
Probabilistic Linear Discriminant Analysis (PLDA) for discriminant analysis. We
are mainly motivated by the recent work of the joint Bayesian (JB) method,
which is originally proposed for discriminant analysis in face verification. We
apply JB to speaker verification and make three contributions beyond the
original JB. 1) In contrast to the EM iterations with approximated statistics
in the original JB, the EM iterations with exact statistics are employed and
give better performance. 2) We propose to do simultaneous diagonalization (SD)
of the within-class and between-class covariance matrices to achieve efficient
testing, which has broader application scope than the SVD-based efficient
testing method in the original JB. 3) We scrutinize similarities and
differences between various Gaussian PLDAs and JB, complementing the previous
analysis of comparing JB only with Prince-Elder PLDA. Extensive experiments are
conducted on NIST SRE10 core condition 5, empirically validating the
superiority of JB with faster convergence rate and 9-13% EER reduction compared
with state-of-the-art PLDA.Comment: accepted by ICASSP201
Tracking of enriched dialog states for flexible conversational information access
Dialog state tracking (DST) is a crucial component in a task-oriented dialog
system for conversational information access. A common practice in current
dialog systems is to define the dialog state by a set of slot-value pairs. Such
representation of dialog states and the slot-filling based DST have been widely
employed, but suffer from three drawbacks. (1) The dialog state can contain
only a single value for a slot, and (2) can contain only users' affirmative
preference over the values for a slot. (3) Current task-based dialog systems
mainly focus on the searching task, while the enquiring task is also very
common in practice. The above observations motivate us to enrich current
representation of dialog states and collect a brand new dialog dataset about
movies, based upon which we build a new DST, called enriched DST (EDST), for
flexible accessing movie information. The EDST supports the searching task, the
enquiring task and their mixed task. We show that the new EDST method not only
achieves good results on Iqiyi dataset, but also outperforms other
state-of-the-art DST methods on the traditional dialog datasets, WOZ2.0 and
DSTC2.Comment: 5 pages, 2 figures, accepted by ICASSP201
- …